62 research outputs found

    Delving Deep into the Sketch and Photo Relation

    Get PDF
    "Sketches drawn by humans can play a similar role to photos in terms of conveying shape, posture as well as fine-grained information, and this fact has stimulated one line of cross-domain research that is related to sketch and photo, including sketch-based photo synthesis and retrieval. In this thesis, we aim to further investigate the relationship between sketch and photo. More specifically, we study certain under- explored traits in this relationship, and propose novel applications to reinforce the understanding of sketch and photo relation.Our exploration starts with the problem of sketch-based photo synthesis, where the unique trait of non-rigid alignment between sketch and photo is overlooked in existing research. We then carry on with our investigation from a new angle to study whether sketch can facilitate photo classifier generation. Building upon this, we continue to explore how sketch and photo are linked together on a more fine-grained level by tackling with the sketch-based photo segmenter prediction. Furthermore, we address the data scarcity issue identified in nearly all sketch-photo-related applications by examining their inherent correlation in the semantic aspect using sketch-based image retrieval (SBIR) as a test-bed. In general, we make four main contributions to the research on relationship between sketch and photo.Firstly, to mitigate the effect of deformation in sketch-based photo synthesis, we introduce the spatial transformer network to our image-image regression framework, which subtly deals with non-rigid alignment between the sketches and photos. The qualitative and quantitative experiments consistently reveal the superior quality of our synthesised photos over those generated by existing approaches.Secondly, sketch-based photo classifier generation is achieved with a novel model regression network, which maps the sketch to the parameters of photo classification model. It is shown that our model regression network is able to generalise across categories and photo classifiers for novel classes not involved in training are just a sketch away. Comprehensive experiments illustrate the promising performance of the generated binary and multi-class photo classifiers, and demonstrate that sketches can also be employed to enhance the granularity of existing photo classifiers.Thirdly, to achieve the goal of sketch-based photo segmentation, we propose a photo segmentation model generation algorithm that predicts the weights of a deep photo segmentation network according to the input sketch. The results confirm that one single sketch is the only prerequisite for unseen category photo segmentation, and the segmentation performance can be further improved by utilising sketch that is aligned with the object to be segmented in shape and position.Finally, we present an unsupervised representation learning framework for SBIR, the purpose of which is to eliminate the barrier imposed by data annotation scarcity. Prototype and memory bank reinforced joint distribution optimal transport is integrated into the unsupervised representation learning framework, so that the mapping between the sketches and photos could be automatically detected to learn a semantically meaningful yet domain-agnostic feature space. Extensive experiments and feature visualisation validate the efficacy of our proposed algorithm.

    Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing

    Full text link
    Existing works on weakly-supervised audio-visual video parsing adopt hybrid attention network (HAN) as the multi-modal embedding to capture the cross-modal context. It embeds the audio and visual modalities with a shared network, where the cross-attention is performed at the input. However, such an early fusion method highly entangles the two non-fully correlated modalities and leads to sub-optimal performance in detecting single-modality events. To deal with this problem, we propose the messenger-guided mid-fusion transformer to reduce the uncorrelated cross-modal context in the fusion. The messengers condense the full cross-modal context into a compact representation to only preserve useful cross-modal information. Furthermore, due to the fact that microphones capture audio events from all directions, while cameras only record visual events within a restricted field of view, there is a more frequent occurrence of unaligned cross-modal context from audio for visual event predictions. We thus propose cross-audio prediction consistency to suppress the impact of irrelevant audio information on visual event prediction. Experiments consistently illustrate the superior performance of our framework compared to existing state-of-the-art methods.Comment: WACV 202

    Generalized Few-Shot Point Cloud Segmentation Via Geometric Words

    Full text link
    Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes. Few-shot point cloud segmentation algorithms address this problem by learning to adapt to new classes at the sacrifice of segmentation accuracy for the base classes, which severely impedes its practicality. This largely motivates us to present the first attempt at a more practical paradigm of generalized few-shot point cloud segmentation, which requires the model to generalize to new categories with only a few support point clouds and simultaneously retain the capability to segment base classes. We propose the geometric words to represent geometric components shared between the base and novel classes, and incorporate them into a novel geometric-aware semantic representation to facilitate better generalization to the new classes without forgetting the old ones. Moreover, we introduce geometric prototypes to guide the segmentation with geometric prior knowledge. Extensive experiments on S3DIS and ScanNet consistently illustrate the superior performance of our method over baseline methods. Our code is available at: https://github.com/Pixie8888/GFS-3DSeg_GWs.Comment: Accepted by ICCV 202

    Now You See Me: Deep Face Hallucination for Unviewed Sketches

    Get PDF

    End-To-End Semi-supervised Learning for Differentiable Particle Filters

    Full text link
    Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications. The dynamic and measurement models in this framework are learnable through the differentiable implementation of particle filters. Past efforts in optimising such models often require the knowledge of true states which can be expensive to obtain or even unavailable in practice. In this paper, in order to reduce the demand for annotated data, we present an end-to-end learning objective based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown. We assess performance of the proposed method in state estimation tasks in robotics with simulated and real-world datasets.Comment: Accepted in ICRA 202

    Sketch-based Video Object Segmentation: Benchmark and Analysis

    Full text link
    Reference-based video object segmentation is an emerging topic which aims to segment the corresponding target object in each video frame referred by a given reference, such as a language expression or a photo mask. However, language expressions can sometimes be vague in conveying an intended concept and ambiguous when similar objects in one frame are hard to distinguish by language. Meanwhile, photo masks are costly to annotate and less practical to provide in a real application. This paper introduces a new task of sketch-based video object segmentation, an associated benchmark, and a strong baseline. Our benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet low-cost reference for video object segmentation. We take advantage of STCN, a popular baseline of semi-supervised VOS task, and evaluate what the most effective design for incorporating a sketch reference is. Experimental results show sketch is more effective yet annotation-efficient than other references, such as photo masks, language and scribble.Comment: BMVC 202

    Sketch-a-Classifier: Sketch-based Photo Classifier Generation

    Get PDF
    Contemporary deep learning techniques have made image recognition a reasonably reliable technology. However training effective photo classifiers typically takes numerous examples which limits image recognition's scalability and applicability to scenarios where images may not be available. This has motivated investigation into zero-shot learning, which addresses the issue via knowledge transfer from other modalities such as text. In this paper we investigate an alternative approach of synthesizing image classifiers: almost directly from a user's imagination, via free-hand sketch. This approach doesn't require the category to be nameable or describable via attributes as per zero-shot learning. We achieve this via training a {model regression} network to map from {free-hand sketch} space to the space of photo classifiers. It turns out that this mapping can be learned in a category-agnostic way, allowing photo classifiers for new categories to be synthesized by user with no need for annotated training photos. {We also demonstrate that this modality of classifier generation can also be used to enhance the granularity of an existing photo classifier, or as a complement to name-based zero-shot learning.Comment: published in CVPR2018 as spotligh

    A nomogram-based optimized Radscore for preoperative prediction of lymph node metastasis in patients with cervical cancer after neoadjuvant chemotherapy

    Get PDF
    PurposeTo construct a superior single-sequence radiomics signature to assess lymphatic metastasis in patients with cervical cancer after neoadjuvant chemotherapy (NACT).MethodsThe first half of the study was retrospectively conducted in our hospital between October 2012 and December 2021. Based on the history of NACT before surgery, all pathologies were divided into the NACT and surgery groups. The incidence rate of lymphatic metastasis in the two groups was determined based on the results of pathological examination following lymphadenectomy. Patients from the primary and secondary centers who received NACT were enrolled for radiomics analysis in the second half of the study. The patient cohorts from the primary center were randomly divided into training and test cohorts at a ratio of 7:3. All patients underwent magnetic resonance imaging after NACT. Segmentation was performed on T1-weighted imaging (T1WI), T2-weighted imaging, contrast-enhanced T1WI (CET1WI), and diffusion-weighted imaging.ResultsThe rate of lymphatic metastasis in the NACT group (33.2%) was significantly lower than that in the surgery group (58.7%, P=0.007). The area under the receiver operating characteristic curve values of Radscore_CET1WI for predicting lymph node metastasis and non-lymphatic metastasis were 0.800 and 0.797 in the training and test cohorts, respectively, exhibiting superior diagnostic performance. After combining the clinical variables, the tumor diameter on magnetic resonance imaging was incorporated into the Rad_clin model constructed using Radscore_CET1WI. The Hosmer–Lemeshow test of the Rad_clin model revealed no significant differences in the goodness of fit in the training (P=0.594) or test cohort (P=0.748).ConclusionsThe Radscore provided by CET1WI may achieve a higher diagnostic performance in predicting lymph node metastasis. Superior performance was observed with the Rad_clin model

    Chronic disease prevention literacy and its influence on behavior and lifestyle: a cross-sectional study in Xinjiang, China

    No full text
    Abstract Objective To understand the status and influencing factors of Kyrgyz chronic disease prevention literacy, and to explore the impact of chronic disease prevention literacy on behavior and living habits. Method Using stratified sampling method, Kyrgyz residents aged ≥ 18 years in Artush City, Aheqi County and Ucha County were surveyed by questionnaire. Results A total of 10,468 subjects were investigated, and the literacy rate of chronic disease prevention in Kyrgyz was 11.2%. The results of Logistic regression analysis showed that the literacy rate of chronic disease prevention was low among people with low education level, herdsmen, low income, urban and chronic disease (P < 0.05). Residents with chronic disease prevention literacy were more inclined to not smoke, not drink alcohol, drink milk every day, eat soy products every month, eat whole grains every day (P < 0.05). Conclusion The literacy level of chronic disease prevention of Kyrgyz residents in Kezhou has been improved, but it is still at a low level compared with another subcategories. The behavioral lifestyle is related to the literacy level of chronic disease prevention. Therefore, local health promotion strategies should be developed to improve the literacy level of chronic disease prevention and promote the formation of good behavioral and living habits
    • …
    corecore